76 research outputs found

    Improved Smoothed Analysis of the k-Means Method

    Get PDF
    The k-means method is a widely used clustering algorithm. One of its distinguished features is its speed in practice. Its worst-case running-time, however, is exponential, leaving a gap between practical and theoretical performance. Arthur and Vassilvitskii (FOCS 2006) aimed at closing this gap, and they proved a bound of \poly(n^k, \sigma^{-1}) on the smoothed running-time of the k-means method, where n is the number of data points and σ\sigma is the standard deviation of the Gaussian perturbation. This bound, though better than the worst-case bound, is still much larger than the running-time observed in practice. We improve the smoothed analysis of the k-means method by showing two upper bounds on the expected running-time of k-means. First, we prove that the expected running-time is bounded by a polynomial in nkn^{\sqrt k} and σ1\sigma^{-1}. Second, we prove an upper bound of k^{kd} \cdot \poly(n, \sigma^{-1}), where d is the dimension of the data space. The polynomial is independent of k and d, and we obtain a polynomial bound for the expected running-time for k,dO(logn/loglogn)k, d \in O(\sqrt{\log n/\log \log n}). Finally, we show that k-means runs in smoothed polynomial time for one-dimensional instances.Comment: To be presented at the 20th ACM-SIAM Symposium on Discrete Algorithms (SODA 2009

    The Alternating Stock Size Problem and the Gasoline Puzzle

    Full text link
    Given a set S of integers whose sum is zero, consider the problem of finding a permutation of these integers such that: (i) all prefix sums of the ordering are nonnegative, and (ii) the maximum value of a prefix sum is minimized. Kellerer et al. referred to this problem as the "Stock Size Problem" and showed that it can be approximated to within 3/2. They also showed that an approximation ratio of 2 can be achieved via several simple algorithms. We consider a related problem, which we call the "Alternating Stock Size Problem", where the number of positive and negative integers in the input set S are equal. The problem is the same as above, but we are additionally required to alternate the positive and negative numbers in the output ordering. This problem also has several simple 2-approximations. We show that it can be approximated to within 1.79. Then we show that this problem is closely related to an optimization version of the gasoline puzzle due to Lov\'asz, in which we want to minimize the size of the gas tank necessary to go around the track. We present a 2-approximation for this problem, using a natural linear programming relaxation whose feasible solutions are doubly stochastic matrices. Our novel rounding algorithm is based on a transformation that yields another doubly stochastic matrix with special properties, from which we can extract a suitable permutation

    Worst Case and Probabilistic Analysis of the 2-Opt Algorithm for the TSP

    Full text link
    2-Opt is probably the most basic local search heuristic for the TSP. This heuristic achieves amazingly good results on real world Euclidean instances both with respect to running time and approximation ratio. There are numerous experimental studies on the performance of 2-Opt. However, the theoretical knowledge about this heuristic is still very limited. Not even its worst case running time on 2-dimensional Euclidean instances was known so far. We clarify this issue by presenting, for every pNp\in\mathbb{N}, a family of LpL_p instances on which 2-Opt can take an exponential number of steps. Previous probabilistic analyses were restricted to instances in which nn points are placed uniformly at random in the unit square [0,1]2[0,1]^2. We consider a more advanced model in which the points can be placed independently according to general distributions on [0,1]d[0,1]^d, for an arbitrary d2d\ge 2. In particular, we allow different distributions for different points. We study the expected number of local improvements in terms of the number nn of points and the maximal density ϕ\phi of the probability distributions. We show an upper bound on the expected length of any 2-Opt improvement path of O~(n4+1/3ϕ8/3)\tilde{O}(n^{4+1/3}\cdot\phi^{8/3}). When starting with an initial tour computed by an insertion heuristic, the upper bound on the expected number of steps improves even to O~(n4+1/31/dϕ8/3)\tilde{O}(n^{4+1/3-1/d}\cdot\phi^{8/3}). If the distances are measured according to the Manhattan metric, then the expected number of steps is bounded by O~(n41/dϕ)\tilde{O}(n^{4-1/d}\cdot\phi). In addition, we prove an upper bound of O(ϕd)O(\sqrt[d]{\phi}) on the expected approximation factor with respect to all LpL_p metrics. Let us remark that our probabilistic analysis covers as special cases the uniform input model with ϕ=1\phi=1 and a smoothed analysis with Gaussian perturbations of standard deviation σ\sigma with ϕ1/σd\phi\sim1/\sigma^d.Comment: An extended abstract of this work has appeared in the Proc. of the 18th ACM-SIAM Symposium on Discrete Algorithms. The results of this extended abstract have been split into two articles (Algorithmica 2014) and (ACM Transactions on Algorithms 2016). This report is an updated version of the first journal article, in which two minor errors in the proofs of Lemma 8 and Lemma 9 have been correcte
    corecore